Regression Analysis for Rival Penalized Competitive Learning Binary Tree
نویسندگان
چکیده
The main aim of this paper is to develop a suitable regression analysis model for describing the relationship between the index efficiency and the parameters of the Rival Penalized Competitive Learning Binary Tree (RPCLb-tree). RPCL-b-tree is a hierarchical indexing structure built with a hierarchical RPCL clustering implementation, which transforms the feature space into a sequence of nested clusters. Based on the RPCL-b-tree, the efficient Nearest-Neighbor search for a query can be performed with the branch-and-bound algorithm. The index efficiency of a RPCL-b-tree relates to a set of parameters: leaf node size of the tree, number of retrieved objects per search, feature dimensionality and database size. To formulate this relationship, we develop a nonlinear regression model in this paper. This regression model includes two components. One is used to describe the relationship between index efficiency and the number of retrieved objects per search; another is to describe the relationship between index efficiency and leaf node size of a RPCL-b-tree. In both of these two components, we consider the influence from the database size and the feature dimensionality. Our experimental results show that the proposed regression model has a high convergibility and high generalization ability. Moreover, this regression model is explainable on its coefficients, whose values directly reflect the index efficiency of a RPCL-b-tree. Depending on the regression model and its estimated coefficients, we can easily analyze the index efficiency of a RPCL-b-tree to be built. Because the parameters of the different kinds of indexing structures are very similar, this model also is suitable to analyze the other kinds of indexing structures. Thus, it is a powerful tool to construct the optimal RPCL-b-tree or other kinds of indexing structures for a database.
منابع مشابه
Rival Penalized Competitive Learning Based Separator on Binary Sources Separation
This paper 1 presents an approach named Rival Penalized Competitive Learning based Binary Source Separator (RPCL-BSS), which has two major advantages: (1) fast in implementation , (2) able to automatically determine the number of binary sources, and (3) able to reduce the noise eeects. Experiments have shown that RPCL-BSS algorithm can not only nd out the correct number of sources quickly, but ...
متن کاملThe Mahalanobis Distance Based Rival Penalized Competitive Learning Algorithm
The rival penalized competitive learning (RPCL) algorithm has been developed to make the clustering analysis on a set of sample data in which the number of clusters is unknown, and recent theoretical analysis shows that it can be constructed by minimizing a special kind of cost function on the sample data. In this paper, we use the Mahalanobis distance instead of the Euclidean distance in the c...
متن کاملA New Competitive Learning Algorithm for Data Clustering
This paper presents a new competitive learning algorithm for data clustering, named the dynamically penalized rival competitive learning algorithm (DPRCA). It is a variant of the rival penalized competitive algorithm [1] and it performs appropriate clustering without knowing the clusters number, by automatically driving extra seed points far away from the input data set. It doesn’t have the "de...
متن کاملExpectation-MiniMax: A General Penalized Competitive Learning Approach to Clustering Analysis
In the literature, the Rival Penalized Competitive Learning (RPCL) algorithm (Xu et al. 1993) and its variants perform clustering analysis well without knowing the cluster number. However, such a penalization scheme is heuristically proposed without any theoretical guidance. In this paper, we propose a general penalized competitive learning approach named Expectation-MiniMax (EMM) Learning that...
متن کاملConvergence Analysis of Rival Penalized Competitive Learning (RPCL) Algorithm
This paper analyzes the convergence of the Rival Penalized Competitive Learning (RPCL) algorithm via a cost function. It is shown that as RPCL process decreases the cost to a global minimum, a correct number of weight vectors will converge to each center of the clusters in the sample data, respectively, while the others diverge.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000